Supported Actions and Schemas
This section provides a comprehensive view of the Context API endpoint actions available in Knowledge Enrichment. Each action is documented with clear descriptions, usage examples, and the schemas to help developers understand how to interact with the API effectively. For complete technical details on the listed actions, see the accompanying reference sections:
- Classify Image into Categories
- Generate Image Description
- Generate Image Embeddings
- Generate and Match Image Metadata
- Detect Entities in Images
- Extract Entities from Text
- Classify Text into Categories
- Generate Text Embeddings from Documents
- Generate and Match Text Metadata
- Generate Text Summary
Note: If an action is specified with an invalid file format, the request is rejected with validation errors.
For detailed technical information associated with each Context API action, see Technical Information.
Classify Image into Categories
The Image Classification action classifies the input image into pre-defined categories. The following are the requirements:
- Input files must be images.
- The
classes
array must contain at least 2 distinct non-empty entries. Note: If no classification actions (image-classification or text-classification) are specified, then the classes must be null or empty.
How it works:
- Define at least 2 classification categories.
- Analyze the image content using the AI model and determine the best matching category or class.
- Return the name of the best-matching classification based on the analysis.
For example, if you provide classes like "damaged_vehicle", "undamaged_vehicle", and "not_a_vehicle", the API might return:
"damaged_vehicle"
Output
Classification result as a string
The output is a single string representing the class label based on the image content.
Schema: image-classification
Attribute | Type | Required | Description | Example |
---|---|---|---|---|
type | string | Yes | Defines the classification category or label type. | "product-label" |
Generate Image Description
The Image Description action analyzes an image and generates a textual description of its contents. The following are the requirements:
- Input files must be images.
objectKeys
must contain only image paths. Note: AllobjectKeys
must be distinct and use valid formats, such as PNG, JPG, or PDF.
How it works:
- The API uses AI models to identify objects, scenes, and activities in the image.
- It synthesizes these elements into a coherent description.
- The result is returned as a natural language text string.
For example:
""A blue Honda CR-V SUV with visible damage to the front bumper parked in a driveway."
Output
String containing the generated description
This output is a single string that provides a descriptive summary of the image content.
Schema: image-description
Attribute | Type | Required | Description | Example |
---|---|---|---|---|
type | string | Yes | A natural language description of the image content. | "A package lying on a desk next to a laptop" |
Generate Image Embeddings
The Image Embeddings action converts visual information into a high-dimensional vector representation. This action requires that its input files must be images.
How it works:
- The image is processed through an AI model designed to extract visual features.
- The network converts these features into a dense vector (typically 512-1024 dimensions).
- These vectors place visually similar images closer in the vector space. The result is an array of floating-point numbers.
For example, [0.021, -0.065, 0.127, 0.036, -0.198, ... ]
These embeddings can be used for:
- Finding visually similar images
- Building image search systems
- Clustering similar images together
Output
Vector of floats
The output is a vector representation of the image.
Schema: image-embeddings
Field | Type | Description | Example |
---|---|---|---|
output | List<float> | High-dimensional vector representation of the image. | [0.123, 0.456, 0.789] |
Generate and Match Image Metadata
The Image Metadata Generation action generates metadata from an input image and matches them against a set of provided metadata examples to return the most relevant results. The following are the requirements:
- Input files must be images.
objectKeys
must contain only image paths. Note: AllobjectKeys
must be distinct and use valid formats, such as PNG, JPG, or PDF.kSimilarMetadata
- must be provided and should contain at least one item. Note: Ifimage-metadata-generation
is not specified, thenkSimilarMetadata
must be null or empty.
Provide kSimilarMetadata
in a POST request to enhance the quality of the API response.
Providing multiple example objects is highly recommended, as it helps the API generate more accurate and contextually relevant results. Each item in kSimilarMetadata
should be a dictionary (JSON object) containing representative example data.
For example, if the input image shows Times Square in New York City, your request might include metadata like:
{
"KSimilarMetadata" : [
{
"event:location": "New Bristol, Terranova",
"keywords:tags": "economy|markets|Noventis|GEF|Terranova|report",
"photo type": "Landscape photography, Nature photography, Macro photography",
"referenceL list:list" : [
"Getty Images: Times Square, New York City",
"Shutterstock Editorial: Times Square NYC",
"National Geographic Photo Archive: Times Square",
"New York Public Library Digital Collections: Times Square",
"Lonely Planet: Times Square Photo Guide"
],
"summary:text": "This report provides a comprehensive analysis of financial trends and investment opportunities across emerging markets in the Terranova region, focusing on the strategies employed by Noventis Group."
}
]
}
How it works:
- You can provide example metadata structure to guide the generation.
- The model analyzes the image and extracts relevant information.
- It structures the information following your metadata templates. The result is a structured JSON object containing the metadata.
For example:
{
"car_metadata": {
"manufacturer": "Honda",
"model": "CR-V",
"color": "blue",
"damage_identified": {
"car_part": "bumper",
"damage_type": "cracked",
"damage_severity": "mild"
}
}
}
Output
Dictionary of generated metadata fields and values
The output is a key-value mapping where each key represents a metadata field (e.g., title, author, date), and each value contains the corresponding data extracted or generated for an image.
Schema: image-metadata-generation
Field | Type | Description | Example |
---|---|---|---|
color | string | The color attribute of the item. | "red" |
shape | string | The shape attribute of the item. | "rectangular" |
barcode | string | Unique identifier typically used for scanning. | "1234567890" |
Detect Entities in Images
The Named Entity Recognition action detects specific entities visible in images, such as people, organization, and locations. This action requires that its input files must be images.
How it works:
- The model analyzes the image to detect text and visual entities.
- It categorizes detected entities into predefined types.
- The result is a structured object containing entity types and values.
For example:
{
"organization": ["Honda", "CR-V"],
"person": ["None"],
"location": ["driveway"],
"object": ["car", "bumper"]
}
Output
Dictionary of detected entities
The output represents detected named entities extracted from image content.
Schema: named-entity-recognition-image
Field | Type | Description | Example |
---|---|---|---|
output | Dictionary<string, List<string>> | A mapping of entity types to a list of their detected values. Useful when multiple values can be associated with a single entity type. | { "Address": ["123 Main St", "New York"], "Date": ["2024-01-01"] } |
Extract Entitites from Text
The Named Entity Recognition Text action identifies and categorizes named entities mentioned in text documents. This action requires that its input files must be a text (PDF) file.
How it works:
- The model processes the text to identify entities like people, organizations, and locations.
- It categorizes each entity into predefined types. The result is a structured object containing entity types and values.
For example:
{
"person": ["John Smith", "Jane Doe"],
"organization": ["Hyland Software", "HR Department"],
"date": ["2023-06-15", "January 1, 2024"],
"location": ["Westlake, OH"]
}
Output
The output reprsents detected named entities extracted from text content.
Schema: named-entity-recognition-text
Field | Type | Description | Example |
---|---|---|---|
Name | List<string> | A list of detected names associated with the entity. | ["John Doe"] |
List<string> | A list of detected email addresses associated with the entity. | ["john.doe@example.com"] |
Classify Text into Categories
The Text Classification action categorizes text into predefined categories. The following are its requirements:
- Input files must be text.
- Classes array must contain at least 2 distinct non-empty entries.
Note: If no classification actions (image-classification or text-classification) are specified, then the classes must be null or empty.
How it works:
- You provide at least two classification classes.
- The model analyzes the text content and determines the best matching class.
- The result is the name of the matching classification.
For example, if you provide classes like "policy_document", "technical_manual", and "marketing_material", the API might return:
"policy document"
Output
Classification result as a string
The output is a single string representing the predicted class label based on the text content.
Schema: text classification
Field Name | Type | Example | Description |
---|---|---|---|
Type | string | "Invoice" | The class label |
Generate Text Embeddings from Documents
The Text Embeddings action converts text into numerical vector representations that capture semantic meaning. This action requires the input files to be in plain text format.
How it works:
- The text is processed through language models that understand context and meaning.
- The model generates vectors where semantically similar texts are closer together.
- The result is an array of floating-point numbers (typically 768-1536 dimensions).
For example:
[0.041, 0.082, -0.153, 0.027, 0.194, ... ]
Text embeddings enable:
- Semantic search capabilities
- Document similarity comparison
- Content recommendation systems
- Clustering similar documents
Output
Matrix (list of floats). The output is a vector representation of the text, typically per sentence.
Schema: text-embeddings
Field Name | Type | Example | Description |
---|---|---|---|
Type | List<List<float> | [[0.12, 0.34, 0.56], [0.78, 0.90, 0.11]] | A matrix where each element is a float. |
Generate and Match Text Metadata
The Text Metadata Generation action creates structured metadata for text documents, particularly useful for PDFs and long-form content. The following are its requirements:
- Input files must be PDF
objectKeys
must contain only PDF paths.
Note: All objectKeys
must be distinct and use valid formats, such as PNG, JPG, or PDF.
kSimilarMetadata
- must be provided and contain at least one item.
Note: If text-metadata-generation
is not specified, then kSimilarMetadata
must be null or empty.
Provide kSimilarMetadata in a POST request to enhance the quality of the API response.
Tip: Providing multiple example objects is highly recommended as it helps the API generate more accurate and contextually relevant results. Each item in kSimilarMetadata
should be a dictionary (JSON object) containing representative example data.
For example, if the input file is "Hyland Employee Handbook US Policies", your request might include metadata like:
"KSimilarMetadata" : [
{
"document:title": "Emerging Markets Overview",
"document:date": "2024-11-15",
"entity:company": "Noventis Group",
"entity:organization": "Global Economic Forum",
"entity:person": "Alex R. Minden",
"event:location": "New Bristol, Terranova",
"keywords:tags": "economy|markets|Noventis|GEF|Terranova|report",
"document:type": "market analysis",
"document:category": "Economics & Finance",
"summary:text": "This report provides a comprehensive analysis of financial trends and investment opportunities across emerging markets in the Terranova region, focusing on the strategies employed by Noventis Group.",
"references:list": [
"Terranova Financial Bulletin, Vol. 12","GEF Annual Review 2023","Noventis Group Internal Strategy Memo"]
}
]
Output
The output is a key-value representation of predicted metadata for a PDF.
Schema: text-metadata-generation
Field Name | Type | Description |
---|---|---|
Type | Dictionary of string to object | A key-value pair structure where keys are strings and values can be any data type. |
Example:
{
"color": "red",
``` "shape": "rectangular",
"barcode": "1234567890"
}
Generate Text Summary
The Text Summary action condenses lengthy documents into summaries that capture key information. This action requires that the input files must be a text (PDF) file.
Optional
maxWordCount
specifies the maximum number of words allowed in the text summarization output. The following are its properties:
- Default: 200
- Constraints: Must be a number greater than 0.
Output
Summary string The output is a generated summary of the text.
Schema: text-summarization
Field Name | Type | Example | Description |
---|---|---|---|
Type | string | "This document describes the shipping process for international packages." | A textual summary or description represented as a string. |